KMID : 1022420150070040003
|
|
Phonetics and Speech Sciences 2015 Volume.7 No. 4 p.3 ~ p.9
|
|
Input Dimension Reduction based on Continuous Word Vector for Deep Neural Network Language Model
|
|
Kim Kwang-Ho
Lee Dong-Hyun Lim Min-Kyu Kim Ji-Hwan
|
|
Abstract
|
|
|
In this paper, we investigate an input dimension reduction method using continuous word vector in deep neural network language model. In the proposed method, continuous word vectors were generated by using Google¡¯s Word2Vec from a large training corpus to satisfy distributional hypothesis. 1-of-|V| coding discrete word vectors were replaced with their corresponding continuous word vectors. In our implementation, the input dimension was successfully reduced from 20,000 to 600 when a tri-gram language model is used with a vocabulary of 20,000 words. The total amount of time in training was reduced from 30 days to 14 days for Wall Street Journal training corpus (corpus length: 37M words).
|
|
KEYWORD
|
|
deep neural network, language model, continuous word vector, input dimension reduction
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|